MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets
نویسندگان
چکیده
Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.
منابع مشابه
SPA: a short peptide assembler for metagenomic data
The metagenomic paradigm allows for an understanding of the metabolic and functional potential of microbes in a community via a study of their proteins. The substrate for protein identification is either the set of individual nucleotide reads generated from metagenomic samples or the set of contig sequences produced by assembling these reads. However, a read-based strategy using reads generated...
متن کاملThe Phylogenetic Diversity of Metagenomes
Phylogenetic diversity--patterns of phylogenetic relatedness among organisms in ecological communities--provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relati...
متن کاملMEGAN analysis of metagenomic data.
Metagenomics is the study of the genomic content of a sample of organisms obtained from a common habitat using targeted or random sequencing. Goals include understanding the extent and role of microbial diversity. The taxonomical content of such a sample is usually estimated by comparison against sequence databases of known sequences. Most published studies use the analysis of paired-end reads,...
متن کاملGene Finding for Microbial Metagenomics
Metagenomics is the study of genomes directly sequenced from an environment. The goal of metagenomics is to understand the ecosystems by identifying the community members and the genes in the environment, and microbial organisms are a main target of the research [2] [3]. Microbial metagenomics enables to access uncultured microbial organisms, thus obtained sequences are composed of various spec...
متن کاملComputational approaches to predict bacteriophage–host relationships
Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in meta...
متن کامل